The present invention deals with text processing. More specifically, the present invention deals with identifying, extracting, clustering and analyzing sentiment-bearing text.
In the past, discerning the sentiment of large volumes of text has been very time consuming. In addition, providing information indicative of the sentiment, and processing that information in any relatively granular way has also been very cumbersome and difficult. One example of this type of information, which will be discussed for the sake of example, is customer review, or customer feedback information which provides customer sentiment as to particular products or services that the customer has purchased.
Customer reviews of products can contain very important and beneficial information for both consumers of the product and for the company manufacturing and selling the product. A consumer may wish to review such information in order to make a purchasing decision. For instance, the consumer may wish to determine what other purchasers of the product think of the product, and specifically what other purchasers might think of certain specific features of the product. If the consumer were buying an automobile, for instance, and the most important thing to the consumer was the handling of the automobile (as opposed to, for example, aesthetics of the automobile) it would be very beneficial to the consumer to have access to customer reviews of that feature (e.g., the handling) of that particular product (e.g., the particular make and model of automobile of interest to the consumer).
This type of information can also be very useful to the company manufacturing and selling the product. The company may wish to know what consumers like and dislike about the product in order to redesign the product, or simply in order to generate a desired marketing campaign or to further define a target consumer for the product. For instance, an executive at an automobile manufacturer may wish to know what the consumers most like, and most dislike, about the certain makes and models of the automobiles being manufactured and sold by the company. This can help the executive to make decisions in redesigning and marketing those makes and models of automobiles.
In the past, it has been very difficult to review and gain meaningful insight into this type of information to determine exactly how consumers perceive products. This is particularly true given the ease with which consumers can provide feedback in the age of electronic communication. The volume of accumulated data which contain consumer feedback regarding products is very large. In the past, obtaining any type of meaningful information from that feedback data has required human analysis of the feedback. Humans have been required to read all the information or a sample of it and then generate certain types of summaries or reports. Of course, this can be very time consuming and expensive, particularly given the volume of consumer feedback for certain products.
The following is simply an exemplary list of the sources which can provide consumer feedback: electronic mail, electronic feedback channels provided at a company's website, chat room text devoted to the products of a given company, bulletin boards or discussion forums, websites devoted to reviewing products of certain companies, blogs (which, in general, represent frequent and chronological publication of personal thoughts and web links from an individual or other entity); and other electronically available articles, papers, newspaper reviews, or other similar documents that represent the opinion or sentiment of consumers or reviewers of products. With all these sources of information, the amount of information that reviews certain nationally known, or internationally known, products, can be staggering. The process of analyzing and processing such information into a meaningful report format can also be very difficult.